Goto

Collaborating Authors

 bleu score




Binarized Neural Machine Translation

Neural Information Processing Systems

The rapid scaling of language models is motivating research using low-bitwidth quantization. In this work, we propose a novel binarization technique for Transformers applied to machine translation (BMT), the first of its kind.


ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation

Neural Information Processing Systems

We present ComSL, a speech-language model built atop a composite architecture of public pretrained speech-only and language-only models and optimized data-efficiently for spoken language tasks.



When does label smoothing help?

Rafael Müller, Simon Kornblith, Geoffrey E. Hinton

Neural Information Processing Systems

To explain these observations, we visualize how label smoothing changes therepresentations learned bythepenultimate layerofthenetwork. We show that label smoothing encourages the representations of training examples from thesame class togroup intight clusters. This results inloss ofinformation inthe logits about resemblances between instances ofdifferent classes, which isnecessary for distillation, but does not hurt generalization or calibration of the model'spredictions.


Layer-Wise Coordination between Encoder and Decoder for Neural Machine Translation

Tianyu He, Xu Tan, Yingce Xia, Di He, Tao Qin, Zhibo Chen, Tie-Yan Liu

Neural Information Processing Systems

Neural Machine Translation (NMT) has achieved remarkable progress with the quick evolvement of model structures. In this paper, we propose the concept of layer-wise coordination for NMT, which explicitly coordinates the learning of hidden representations of the encoder and decoder together layer by layer,gradually from lowleveltohigh level.